Learning Probabilistic Residual Finite State Automata
نویسندگان
چکیده
We introduce a new class of probabilistic automata: Probabilistic Residual Finite State Automata. We show that this class can be characterized by a simple intrinsic property of the stochastic languages they generate (the set of residual languages is finitely generated) and that it admits canonical minimal forms. We prove that there are more languages generated by PRFA than by Probabilistic Deterministic Finite Automata (PDFA). We present a first inference algorithm using this representation and we show that stochastic languages represented by PRFA can be identified from a characteristic sample if words are provided with their probabilities of appearance in the target language. Introduction In the field of machine learning, most realistic situations deal with data provided by a stochastic source and probabilistic models, such as Hidden Markov Models (HMMs) or probabilistic automata (PA), become increasingly important. For example, speech recognition, computational biology and more generally, every field where statistical sequence analysis is needed, may use this kind of models. In this paper, we focus on Probabilistic Automata. A probabilistic automata can be described by its structure (a Finite State Automata) and by a set of continuous parameters (probability to emit a given letter from a given state or to end the generation process). There exist several fairly good methods to adjust the continuous parameters of a given structure to a training set of examples. However the efficient building of the structure from given data is still an open problem. Hence most applications of HMMs or PA assume a fixed model structure, which is either chosen as general as possible (i.e. a complete graph) or a priori selected using domain knowledge. Several learning algorithms, based on previous works in the field of grammatical inference, have been designed to output a deterministic structure (Probabilistic Deterministic Finite State Automata: PDFA) from training data ([1], [2], [3]; see also [4], [5] for early works) and several interesting theoretical and experimental results have been obtained. However, unlike to the case of non stochastic
منابع مشابه
Spectral Learning from a Single Trajectory under Finite-State Policies
We present spectral methods of moments for learning sequential models from a single trajectory, in stark contrast with the classical literature that assumes the availability of multiple i.i.d. trajectories. Our approach leverages an efficient SVD-based learning algorithm for weighted automata and provides the first rigorous analysis for learning many important models using dependent data. We st...
متن کاملProbabilistic Deterministic Infinite Automata
We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define a PDFA with an infinite number of states (probabilistic deterministic infinite automata, or PDIA) and show how to average over its connectivity structure and state-specific emission distributions. Given a finite training sequence, posterior inference in the PDIA can ...
متن کاملAngluin-Style Learning of NFA
This paper introduces NL, a learning algorithm for inferring non-deterministic finite-state automata using membership and equivalence queries. More specifically, residual finite-state automata (RFSA) are learned similar as in Angluin’s popular L algorithm, which however learns deterministic finite-state automata (DFA). As RFSA can be exponentially more succinct than DFA, RFSA are the preferable...
متن کاملPhase Transitions of Bounded Satisfiability Problems
We introduce NL, a learning algorithm for inferring non-deterministic finite-state automata using membership and equivalence queries. More specifically, residual finite-state automata (RFSA) are learned similarly as in Angluin’s popular L algorithm, which, however, learns deterministic finitestate automata (DFA). Like in a DFA, the states of an RFSA represent residual languages. Unlike a DFA, a...
متن کاملLearning Probabilistic Finite Automata
Stochastic deterministic finite automata have been introduced and are used in a variety of settings. We report here a number of results concerning the learnability of these finite state machines. In the setting of identification in the limit with probability one, we prove that stochastic deterministic finite automata cannot be identified from only a polynomial quantity of data. If concerned wit...
متن کامل